web-client: formant-weighted radial waveform for speaking state#348
Merged
web-client: formant-weighted radial waveform for speaking state#348
Conversation
Stacks on merged PR #338. Makes the 24-bar radial waveform respond to phoneme changes, not just overall amplitude. Top-3 peak bins in the frequency spectrum act as formant proxies (approximating F1/F2/F3); bars within 6 bins of a peak get up to 1.8× their raw height. The ring shape visibly rotates/morphs when the vowel changes ("ahhh" → "eee"), reading as "the avatar is articulating" rather than "a VU meter next to the avatar." No new deps. No CSS changes. No asset changes. ~40 added lines inside the existing `startSpeakingDetection()` closure in src/web-client.ts. Preserves the existing graceful-degradation path: silence → `findPeaks` leaves all peakIdx=-1 → boost=1.0 → identical to prior behavior. Ring stays in the canvas margin (radii 24–30) outside the 44×44 image — proven visible at current display size. Bigger SVG-overlay redesign for the hero screen is tracked as a separate follow-up per the Mini/MacBook two-bot consensus today. Spec at notes/avatar-formant-waveform-spec.md. Research memo: notes/avatar-animation-research.md (Option D). Verified: - npx tsc --noEmit --skipLibCheck clean - Embedded browser-JS extracted and `node --check`-ed cleanly (per feedback_web_client_embedded_js_no_ts rule — no TS syntax inside template literal) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi
commented
Apr 15, 2026
Owner
Author
sonichi
left a comment
There was a problem hiding this comment.
MacBook review: LGTM. Clean formant extraction — top-3 peak detection with proximity-weighted boost. The 6-bin radius and 0.8 max boost factor are reasonable defaults. No TS-in-JS mistakes this time. Merge when ready.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on merged #338. Makes the 24-bar radial waveform respond to phoneme changes, not just overall amplitude. Top-3 peak bins in the frequency spectrum act as formant proxies (F1/F2/F3); bars within 6 bins of a peak get up to 1.8× their raw height. The ring shape visibly rotates/morphs when the vowel changes ("ahhh" → "eee"), which reads as "the avatar is articulating" rather than "a VU meter next to the avatar."
Why this approach
Converged with @sutando#9708 in #dev today after discussing three scope options:
What changes
src/web-client.ts— inside existingstartSpeakingDetection()closure: addfindPeaks()(linear top-K local-max scan, K=3), and replace the bar-draw loop'sval = buf[…]/255with a formant-weighted value.Test plan
npx tsc --noEmit --skipLibCheckcleannode --check-ed clean (perfeedback_web_client_embedded_js_no_ts)speaking=falsegates the draw — existing behavior preserved (no peaks found →boost=1.0→ raw amplitude → same as today)Graceful degradation
When the spectrum is low (silence)
findPeaksleaves allpeakIdx[k] = -1,minDiststays 999,boost = 1.0,val = raw. Output is byte-identical to the pre-PR behavior. If the whole analyser pipeline fails,speaking = falsestill toggles the border CSS on the<img>— the visor keeps its color animation even without the canvas overlay.Scope explicitly NOT in this PR
References
notes/avatar-animation-research.md— Option Dnotes/avatar-formant-waveform-spec.md🤖 Generated with Claude Code